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, ^ ', Abstract 

l/~j ■ We present and analyze a wait-free deterministic algorithm for solving the at-most-once problem; 

how m shared-memory fail-prone processes perform asynchronously n tasks at most once. Our algorith- 
mic strategy provides for the first time nearly optimal effectiveness, which is a measure that expresses the 

r ^ I total number of tasks completed in the worst case. The effectiveness of our algorithm equals n — 2m + 2. 

This is up to an additive factor of m close to the known effectiveness upper bound n — m + 1 over 
all possible algorithms and improves on the previously best known deterministic solutions that have ef- 
Y^ • fectiveness only n — log to • o{n). We also present an iterated version of our algorithm that for any 

m = 0( ^*\/n/ log n) is both effectiveness-optimal and work-optimal, for any constant e > 0. We 
then employ this algorithm to provide a new explicit algorithmic solution for the Write-All problem 
1,^ , which is work optimal for any to = 0( ^^\/n/ log n) improving the previously best known result of 

O' TO = 0{^n/\ogn). 

Q^ ■ Keywords: At-most-once problem, Write-All, I/O automata, asynchronous shared memory. 

(N 

: 1 Introduction 

The at-most-once problem for asynchronous shared memory systems was introduced by Kentros et al. ifTSl 
as the problem of performing a set of n jobs by m fail-prone processes while maintaining at-most-once 
semantics. 
H \ The at-most-once semantic for object invocation ensures that an operation accessing and altering the 

state of an object is performed no more than once. This semantic is among the standard semantics for 
remote procedure calls (RPC) and method invocations and it provides important means for reasoning about 
the safety of critical appUcations. Uniprocessor systems may trivially provide solutions for at-most-once 
semantics by implementing a central schedule for operations. The problem becomes very challenging for 
autonomous processes in a system with concurrent invocations on multiple objects. 

Perhaps the most important question in this area is devising algorithms for the at-most-once problem 
with good effectiveness. The complexity measure of effectiveness |[T5l describes the number of jobs com- 
pleted (at-most-once) by an implementation, as a function of the overall number of jobs n, the number 
of processes m, and the number of crashes /. The only deterministic solutions known exhibit very low 

/ 1 \logm 

effectiveness ( ni°g'" — 11 (see II15II ) which for most choices of the parameters is very far from opti- 

mal (unless m = 0(1)). Contrary to this, the present work presents the first deterministic algorithm for 
the at-most-once problem which is optimal up to additive factors of m. Specifically our effectiveness is 



n — {2m — 2) which comes close to an additive factor of in to the known ( [IS] ) upper bound over all pos- 
sible algorithms for effectiveness n — m + I. We also demonstrate how to construct an algorithm which 
has effectiveness n — 0(?7i^ log n log m) and work complexity 0{n + m^'^'^ log n), and is both effectiveness 
and work optimal when m = 0( ^+-y/n/logn), for any constant e > 0. Work complexity counts the total 
number of basic operations performed by the processes. Finally we show how to use this algorithm in order 
to solve the Write-All problem |[T4l with work complexity 0(n + m^^'^ log n), which improves on the best 
known explicit result by Malewicz |[24ll that has work complexity 0(n + m^ log n). 

Related Work: A wide range of works study at-most-once semantics in a variety of settings. At-most-once 
message delivery lH [HI HU |27l and at-most-once semantics for RPC ||4l[l9l|20l|2Tl|25l, are two areas that 
have attracted a lot of attention. Both in at-most-once message delivery and RPCs, we have two entities 
(sender/client and receiver/server) that communicate by message passing. Any entity may fail and recover 
and messages may be delayed or lost. In the first case one wants to guarantee that duplicate messages will 
not be accepted by the receiver, while in the case of RPCs, one wants to guarantee that the procedure called 
in the remote server will be invoked at-most-once (261. 

In Kentros et al. |[T5l . the at-most-once problem for asynchronous shared memory systems and the 
correctness properties to be satisfied by any solution were defined. The first algorithms that solve the at- 
most-once problem were provided and analyzed. Specifically they presented two algorithms that solve the at- 
most-once problem for two processes with optimal effectiveness and a multi-process algorithm, that employs 
a two-process algorithm as a building block, and solves the at-most-once problem with effectiveness n — 
log 771 • o(n) and work complexity 0(n + mlogm). Subsequently Hillel |[T3l provided a probabilistic 
algorithm in the same setting with optimal effectiveness and expected work complexity 0{nm? logm) by 
employing a probabilistic multi-valued consensus protocol as a building block. 

Di Crescenzo and Kiayias in |6j (and later Fitzi et al. (H) demonstrate the use of the semantic in message 
passing systems for the purpose of secure communication. Driven by the fundamental security requirements 
of one-time pad encryption, the authors partition a common random pad among multiple communicating 
parties. Perfect security can be achieved only if every piece of the pad is used at most once. The authors 
show how the parties maintain security while maximizing efficiency by applying at-most-once semantics on 
pad expenditure. 

One can also relate the at-most-once problem to the consensus problem ||7j[T2l|23l[T7l- Indeed, consen- 
sus can be viewed as an at-most-once distributed decision. Another related problem is process renaming, 
see Attiya et al. Q where each process identifier should be assigned to at most one process. 

The at-most-once problem has also many similarities with the Write- All problem for the shared memory 
model im [El [m [m |23|. First presented by Kanellakis and Shvartsman |[T4|, the Write- All problem is 
concerned with performing each task at-least-once. Most of the solutions for the Write- All problem, exhibit 
super-linear work when m <<^n. Malewicz ll24l was the first to present a solution for the Write-All problem 
that has linear work for a non-trivial number of processors. The algorithm presented by Malewicz |[24l has 
work 0(n + rn^logn) and uses test-and-set operations. Later Kowalski and Shvartsman |[T6l presented a 
solution for the Write- All problem that for any constant e has work 0(n + m?'^'^). Their algorithm uses a 
collection of q permutations with contention 0{q log q) for a properly choose constant q. 

We note that the at-most-once problem becomes much simpler when shared-memory is supplemented by 
some type of read-modify-write operations. For example, one can associate a test-and-set bit with each task, 
ensuring that the task is assigned to the only process that successfully sets the shared bit. An effectiveness 
optimal implementation can then be easily obtained from any Write-All solution. Thus, in this paper we 
deal only with the more challenging setting where algorithms use atomic read/ write registers. 
Contributions: In this paper we present and analyze the algorithm KK/j that solves the at-most-once 
problem. The algorithm is parametrized hy j3 > m and has effectiveness n — 13 — m -\- 2. \i fi < m the 
correctness of the algorithm is still guaranteed, but the termination of the algorithm cannot be guaranteed. 



For P = m the algorithm has optimal effectiveness of n — 2m + 2 up to an additive factor of m. Note that the 
upper bound for the effectiveness of any algorithm is n — / lITSi , where / < m — 1 is the number of failures in 
the system. We further prove that for /? > 3m^ the algorithm has work complexity 0(nm log n log in). We 
use algorithm KK/j with /3 = 3?7t,^, in order to construct an iterated version of our algorithm which for any 
constant e > 0, has effectiveness of n — 0(m^ log n log m) and work complexity 0(n + m^"'"'^ log n). This 
is both effectiveness-optimal and work-optimal for any m = 0( ^+yn/logn). We note that our solutions 
are deterministic and assume worst-case behavior. In the probabilistic setting Hillel |[T3l shows that optimal 
effectiveness can be achieved with expected work complexity 0(nm^ logm). 

We then demonstrate how to use the iterated version of our algorithm in order to solve the Write-All 
problem with work complexity 0(n + m^+'^ logn) for any constant e > 0. Our solution improves on the 
algorithm of Malewicz |[24l . which is the best known explicit result, in two ways. Firstly our solution is work 



optimal for a wider range of m, namely for any m = 0{ '^^■{Jnj log n) compared to the m = 0( yn/logre) 
of Malewicz . Secondly our solution does not assume the test-and-set primitive used by Malewicz |[24l . 
and relies only on atomic read/write memory. Note that there is a Write- All algorithm due to Kowalski and 
Shvartsman |[T6l . which is work optimal for a wider range of processors m than our algorithm, specifically 
for 771 = 0( '^^■\/n). However, their algorithm uses a collection of q permutations with contention 0{q log q), 
while it is not known to date how to construct such permutations in polynomial time. Thus their result is so 
far existential, while ours is explicit. 

2 Model, Definitions, and Efficiency 

We define our model, the at-most-once problem, and measures of efficiency. 

2.1 Model and Adversary 

We model a multi-processor as ni asynchronous, crash-prone processes with unique identifiers from some 
set V. Shared memory is modeled as a collection of atomic read/write memory cells, where the number of 
bits in each cell is explicitly defined. We use the Input/Output Automata formalism |[22l l23l to specify and 
reason about algorithms; specifically, we use the asynchronous shared memory automaton formalization 
1911231. Each process p is defined in terms of its states statesp and its actions actsp, where each action is of 
the type input, output, or internal. A subset startp C stateSp contains all the start states of p. Each shared 
variable x takes values from a set Vx, among which there is initx, the initial value of x. 

We model an algorithm A as a composition of the automata for each process p. Automaton A consists 
of a set of states states{A), where each state s contains a state Sp G stateSp for each p, and a value w G 14 
for each shared variable x. Start states start{A) is a subset of states{A), where each state contains a startp 
for each p and an initx for each x. The actions of A, acts{A) consists of actions vr G actSp for each process 
p. A transition is the modification of the state as a result of an action and is represented by a triple (s, vr, s'), 
where s, s' G states{A) and vr G acts{A). The set of all transitions is denoted by trans{A). Each action 
in acts{A) is performed by a process, thus for any transition (s, vr, s'), s and s' may differ only with respect 
to the state Sp of process p that invoked vr and potentially the value of the shared variable that p interacts 
with during vr. We also use triples {{varss},7r, {varss'}), where varsg and varss' are subsets of variables 
in s and s' respectively, as a shorthand to describe transitions without having to specify s and s' completely; 
here varsg and varsg' contain only the variables whose value changes as the result of vr, plus possibly some 
other variables of interest. 

An execution fragment of A is either a finite sequence, so,7ri,si, . . .,7rr,Sr, or an infinite sequence, 
so,vri,si, . . .,irr,Sr,- ■ ., of alternating states and actions, where (s^, vr^+i, Sfc+i) £ trans{A) for any A; > 0. 
If So G start{A), then the sequence is called an execution. The set of executions of A is execs(A). We say 
that execution a is fair, if a is finite and its last state is a state of A where no locally controlled action is 
enabled, or a is infinite and every locally controlled action vr G acts{A) is performed infinitely many times 
or there are infinitely many states in a where vr is disabled. The set of fair executions of A is fairexecs{A). 



An execution fragment ol extends a finite execution fragment a of A, if a' begins with the last state of 
a. We let a • a' stand for the execution fragment resulting from concatenating a and a' and removing the 
(duplicated) first state of a' . 

For two states s and s' of an execution fragment a, we say that state s precedes state s' and we write 
s < s' if s appears before s' in a. Moreover we write s < s' if state s either precedes state s' in a or the 
states s and s' are the same state of a. We use the term precedes and the symbols < and < in a same way 
for the actions of an execution fragment. We use the term precedes and the symbol < if an action vr appears 
before a state s in an execution fragment a or if a state s appears before an action tt in a. Finally for a set of 
states S of an execution fragment a, we define as Smax = max S the state Smax £ S, s.t. Vs € 5, s < Smax 
in a. 

We model process crashes by action stop^ in acts{A) for each process p. If stop„ appears in an execution 
a then no actions vr G actsp appear in a thereafter. We then say that process p crashed. Actions stop^ arrive 
from some unspecified external environment, called adversary. In this work we consider an omniscient, on- 
line adversary | fT4l that has complete knowledge of the algorithm executed by the processes. The adversary 
controls asynchrony and crashes. We allow up to / < tti crashes. We denote by fairexecsAA) all fair 
executions of A with at most / crashes. Note that since the processes can only communicate through atomic 
read/write operations in the shared memory, all the asynchi^onous executions are linearizable. This means 
that concurrent actions can be mapped to an equivalent sequence of state transitions, where only one process 
performs an action in each transition, and thus the model presented above is appropriate for the analysis of 
a multi-process asynchronous atomic read/write shared memory system. 

2.2 At-Most-Once Problem, Effectiveness and Complexity 

We consider algorithms that perform a set of tasks, called jobs. Let A be an algorithm specified for m 
processes with ids from set P = [1 . . . m], and for n jobs with unique ids from set J" = [1 . . . n]. We 
assume that there are at least as many jobs as there ai^e processes, i.e., n > m. We model the performance 
of job j by process p by means of action dopj. For a sequence c, we let len{c) denote its length, and we let 
clvr denote the sequence of elements vr occurring in c. Then for an execution a, len (a|do ) is the number 
of times process p performs job j. Finally we denote by F^ = {p\stopp occurs in a} the set of crashed 
processes in execution a. Now we define the number of jobs performed in an execution. Note here that we 
are borrowing most definitions from Kentros et al. lITSl . 

Definition 2.1 For execution a we denote by J^ = {j G J\Aop^j occurs in a for some p G V}. The total 
number of jobs performed in a is defined to be Do{a) = \J'a\- 

We next define the at-most-once problem. 

Definition 2.2 Algorithm A solves the at-most-once problem if for each execution a of A we have Vj G ^7 : 
EpeP^en(a|dop,,) < 1- 

Definition 2.3 Let S be a set of elements with unique identifiers. We define as the rank of element x £ S 
and we write [x]g, the rank ofx if we sort in ascending order the elements of S according to their identifiers. 

Measures of Efficiency. We analyze our algorithms in terms of two complexity measures: effectiveness 
and work. Effectiveness counts the number of jobs performed by an algorithm in the worst case. 

Definition 2.4 The effectiveness of algorithm A is: EA{n,m, f) = Toain^^g^^^g^^^g /j^\{Do{a)), where m 
is the number of processes, n is the number of jobs, and f is the number of crashes. 



A trivial algorithm can solve the at-most-once problem by splitting the n jobs in groups of size — and 
assigning one group to each process. Such a solution has effectiveness E{n, m, f) = {m — f) • — (consider 
an execution where / processes fail at the beginning of the execution). 

Work complexity measures the total number of basic operations (comparisons, additions, multiplica- 
tions, shai^ed memory reads and writes) performed by an algorithm. We assume that each internal or shared 
memory cell has size O(logn) bits and performing operations involving a constant number of memory 
cell costs 0(1). This is consistent with the way work complexity is measured in previous related work 

miisiiii. 

Definition 2.5 The work of algorithm A, denoted by Wa, is the worst case total number of basic operations 
performed by all the processes of algorithm A. 

Finally we repeat here as a Theorem, Corollary 1 from Kentros et al. ifTSl . that gives an upper bound on 
the effectiveness for any algorithm solving the at-most-once problem. 

Theorem 2.1 from Kentros et al. ^15^ 

For all algorithms A that solve the at-most-once problem with m processes and n > mjobs in the presence 

of f < m crashes it holds that Ea^u, m, f) < n — f. 

3 Algorithm KK^ 

Here we present algorithm KK/j, that solves the at-most-once problem. Parameter /3 G N is the termination 
parameter of the algorithm. Algorithm KK^ is defined for all (3 > m. If /3 = m, algorithm KK^ has 
optimal up to an additive factor of m effectiveness. Note that although (3 > mis not necessary in order to 
prove the correctness of the algorithm, if /3 < m we cannot guarantee termination of algorithm KK^. 

The idea behind the algorithm KK^ (see Fig.[T]) is quite intuitive and is based on an algorithm for 
renaming processes presented by Attiya et al. EJ. Each process p, picks a job i to perform, announces (by 
writing in shared memory) that it is about to perform the job and then checks if it is safe to perform it 
(by reading the announcements other processes made in the shared memory, and the jobs other processes 
announced they have performed). If it is safe to perform the job i, process p will proceed with the dop.i action 
and then mark the job completed. If it is not safe to perform i, p will release the job. In either case, p picks 
a new job to perform. In order to pick a new job, p reads from the shared memory and gathers information 
on which jobs are safe to perform, by reading the announcements that other processes made in the shared 
memory about the jobs they are about to perform, and the jobs other processes announced they have already 
performed. Assuming that those jobs ai^e ordered, p splits the set of "free" jobs in m intervals and picks 
the first job of the interval with rank equal to p's rank. Note that since the information needed in order to 
decide whether it is safe to perform a specific job and in order to pick the next job to perform is the same, 
these steps are combined in the algorithm. In Figure [H we use function ran/c(SETi, SET2, i), that returns 
the element of set SETi \ SET2 that has rank i. If SETi and SET2 have 0(n) elements and are stored 
in some tree structure like red-black tree or some variant of B-tree, the operation ra77,fc(SETi, SET2,i), 
costs 0(|SET2| logn) assuming that SET2 C SETi. We will prove that the algorithm has effectiveness 
n — {(3 -\- m — 2). For /5 = 0{m) this effectiveness is asymptotically optimal for any m = o{n). Note 
that by Theorem l2. li the upper bound on effectiveness of the at-most-once problem is n — /, where / is the 
number of failed processes in the system. Next we present algorithm KK/3 in more detail. 

Siiared Variables, next is an array with m elements. In the cell nextg of the array process q announces 
the job it is about to perform. From the structure of algorithm KK^, only process q writes in cell nextg. On 
the other hand any process may read cell nextg. 



Shared Variables: 

next = {nexti, . . 

done = {donei^i, 

Signature: 

Input: 

stopp, pGV 



, nextm}, nextq £ {0, . . . , n} initially 

. . , donem.n}, doneq^i S {0, . . . , n} initially 



Output: 
do„ 



per, j &J 



Internal: 
compN 
checkp, p £ V 



compNext p GV 



Internal Read: 
gatherTrVp, p G V 
gatherDonep, p 6 P 



Internal Write: 
setNextp, p £ V 
donep, p 6 ■P 



State: 



STATUSp 6 {compjnext, set^next, gather .try, gather jdone, check, do, done, end, stop}, 

initially STATUSp = compjnext 

FREEp, DONEp, TRYp C J, initially FREEp = J and DONEp = TRYp = 

, n}, initially POSp (j) = 1 



NEXTp 6 {l,...,n + l}, initially undefined 


Qpe{i,.. 


. ,'in}, initially 1 


TMPp G {!,..., n}, initially 


undefined 






Transitions of process p: 










Input stop 






Internal Read gatherDonep 


Internal Read gatherJry 


Effect: 






Precondition: 


Precondition: 


STATUSp <— stop 






STATUSp = gather jione 
Effect: 


STATUSp = gather .try 
Effect: 


Internal compNextp 






if Qp ^ p then 


if Qp 7^ p then 


Precondition: 






™Pp^'^one^p,Posp(Qp) 


TMPp *r- nextqp 


STATUSp = compjn.ext 






if POSp (Qp) <n 


if TMPp < n then 


Effect: 






AND TMPp > then 


TRYp ^ TRYp U {TMPp} 


if 1 FREEp \ TRYp | > /3 then 




DONEp <- DONEp U {TMPp} 


end 


|FREEp|-(m-l) 






FREEp <- FREEp \ {TMPp} 
POSp (Qp) = POSp (Qp) + 1 


end 


if TMPp > 1 then 






if Qp + 1 < m then 


TMPp <- [(p - 1) • TMPpJ + 1 




else Qp ^ Qp + 1 


Qp <- Qp + 1 


NEXTp <— rank (FREEp 


,TRYp, 


, TMPp) 


end 


else 


else 






else Qp <- Qp + 1 


Qp^ 1 


NEXTp <— rank (FREEp 


,TRYp, 


,P) 


end 


STATUSp <— gather.done 


end 






if Qp > m then 


end 


Qp ^ 1 






Qp •«- 1 

STATUSp <— check 




TRYp ^ 






Internal Write setNextp 


STATUSp <- set.next 
else 






end 


Precondition: 








STATUSp = setjnext 


STATUSp <— end 
end 






Internal Write donep 


Effect: 






Precondition: 


nextp •«- NEXTp 








STATUSp = done 


STATUSp ^— gather .try 


Internal checkp 
Precondition: 






Effect: 








donep_posp{p) <- NEXTp 


Output dOp J 


STATUSp = check 
Effect: 






DONEp <- DONEp U {NEXTp} 


Precondition: 






FREEp ^ FREEp \ {NEXTp} 


STATUSp = do 


if NEXTp ^ TRYp 






POSp (p) ^ POSp (p) + 1 


NEXTp = j 


AND NEXTp ^ DONEp 






STATUSp •«— compjnext 


Effect: 


then STATUSp <— do 








STATUSp <- done 


else 










STATUSp <r- compjnext 










end 











Figure 1: Algorithm KK^: Shared Variables, Signature, States and Transitions 
done is an m * n matrix. In line q of the matrix, process q announces the jobs it has performed. Each 
cell of line q contains the identifier of exactly one job that has been performed by process q. Only process 
q writes in the cells of line q but any process may read them. Moreover, process q updates line q by adding 
entries at the end of it. 

Internal Variables of process p. The variable STATUSj, G {compjnext, setjnext, gatherJry, 
gather _done, check, do, done, end, stop] records the status of process p and defines its next action as 
follows: compjnext - process p is ready to compute the next job to perform (this is the initial status of 
p), setjnext - p computed the next job to perform and is ready to announce it, gatherJry - p reads the 
array next in shared memory in order to compute the TRY set, gather _done - p reads the matrix done in 



shared memory in order to update the DONE and FREE sets, check - p has to check whether it is safe to 
perform its current job, do - p can safely perform its current job, done - p performed its current job, end - p 
terminated, stop - p crashed. 

FREEp, DONEp, TRYp C J" are three sets that are used by process p in order to compute the next 
job to perform and whether it is safe to perform it. We use some tree structure like red-black tree or some 
variant of B-tree E Ull for the sets FREEp, DONEp and TRYp, in order to be able to add, remove and 
search elements in them in 0(log n). FREEp, is initially set to J and contains an estimate of the jobs that 
are still available. DONEp is initially empty and contains an estimate of the jobs that have been performed. 
No job is removed from DONEp or added to FREEp during the execution of algorithm KK^. TRYp is 
initially empty and contains an estimate of the jobs that other processes are about to perform. It holds that 
I TRYp I < m, since there are ?n, — 1 processes apart from process p that may be attempting to perform a job. 

POSp is an an^ay of m elements. Position POSp {q) of the an^ay contains a pointer in the line q of the 
shared matrix done. POSp {q) is the element of line q that process p will read from. In the special case 
where q = p, POSp {p) is the element of line p that process p will write into after performing a new job. The 
elements of the shared matrix done are read when process p is updating the DONEp set. 

NEXTp contains the job process p is attempting to perform. 

TMPp is a temporary storage for values read from the shared memory. 

Qp G {1, . . . , m} is used as indexing for looping through process identifiers. 

Actions of process p. We visit them one by one below. 

compNextp: Process p computes the set FREEp \ TRYp and if it has more or equal elements to /?, were 
/3 is the termination parameter of the algorithm, process p computes its next candidate job, by splitting the 
FREEp \ TRYp set in m parts and picking the first element of the p-th part. In order to do that it uses the 
function rank{SETi, SET2, «), which returns the element of set SETi \ SET2 with rank i. Finally process 
p sets the TRYp set to the empty set, the Qp internal variable to 1 and its status to set-next in order to 
update the shared memory with its new candidate job. If the FREEp \ TRYp set has less than /3 elements 
process p terminates. 

setNextp! Process p announces its new candidate job by writing the contents of its NEXTp internal 
variable in the p-th position of the next array. Remember that the next array is stored in the shai^ed memory. 
Process p changes its status to gather J,ry, in order to start collecting the TRYp set from the next array. 

gatherTryp! With this action process p implements a loop, which reads from the shared memory all the 
positions of the array next and updates the TRYp set. In each execution of the action, process p checks if 
Qp is equal with p. If it is not equal, p reads the Qp-th position of the array next, checks if the value read is 
less than n + 1 and if it is, adds the value it read in the TRYp set. If Qp is equal with p, p just skips the step 
described above. Then p checks if the value of Qp + 1 is less than ttt, + 1. If it is, then p increases Qp by 1 
and leaves its status gather Jry, otherwise p has finished updating the TRYp set and thus sets Qp to 1 and 
changes its status to gather _done, in order to update the DONEp and FREEp sets from the contents of the 
do7ie matrix. 

gatherDonep: With this action process p implements a loop, which updates the DONEp and FREEp sets 
with values read from the matrix done, which is stored in shared memory. In each execution of the action, 
process p checks if Qp is equal with p. If it is not equal, p uses the internal variable POSp (Qp) , in order to 
read fresh values from the line Q„ of the done matrix. In detail, p reads the shared variable done^ „„„ r^ \ , 

checks if POSp (Qp) is less than n + 1 and if the value read is greater than 0. If both conditions hold, p adds 
the value read at the DONEp set, removes the value read from the FREEp set and increases POSp (Qp) by 
one. Otherwise, it means that either process Qp has terminated (by performing all the n jobs) or the line Qp 
does not contain any new completed jobs. In either case p increases the value of Qp by 1. The value of Qp 
is increased by 1 also if Qp was equal with p. Finally p checks whether Qp is greater than m; if it is, p has 
completed the loop and thus changes its status to check. 



checkp! Process p checks if it is safe to perform its current job. This is done by checking if NEXTp 
belongs to the set TRYp or to the set DONEp. If it does not, then it is safe to perform the job NEXTp and 
p changes its status to do. Otherwise it is not safe, and thus p changes its status to comp_next, in order to 
find a new job that may be safe to perform. 

dop j! Process p performs job j. Note that NEXTp = j is part of the preconditions for the action to be 
enabled in a state. Then p changes its status to done. 

doncp: Process p writes in the donCp^^Q^^i^p) position of the shared memory the value of NEXTp, letting 
other processes know that it performed job NEXTp. Also p adds NEXTp to its DONEp set, removes NEXTp 
from its FREEp set, increases POSp {p) by 1 and changes its status to compjriext. 

stopp! Process p crashes by setting its status to stop. 

4 Correctness and Effectiveness Analysis 

Next we begin the analysis of algorithm KK/3, by proving that KK^ solves the at-most-once problem. That 
is, there exists no execution of KK^ in which 2 distinct actions dop^j and dOg j appear for some i ^ J and 
p,q ^ V. In the proofs, for a state s and a process p we denote by s. FREEp, s. DONEp, s.TRYp, the 
values of the internal variables FREE, DONE and TRY of process p in state s. Moreover with s.next, and 
s.done we denote the contents of the array next and the matrix done in state s. Remember that next and 
done, are stored in shared memory. 

Lemma 4.1 There exists no execution a of algorithm KK^, such that 3i G J7 and 3p, q G V for which 
dOp,j,dOq,i G a. 

Proof. Let us for the sake of contradiction assume that there exists an execution a G execs (KK^) and 
i & J and p,q £ V such that dop.j, dog^i G a. We examine two cases. 



Case Ip = q: Let states si, s^, S2, Sg G a, such that the transitions f si, dop^j, Si ) , ( S2, dop^j, Sg ] G a and 

without loss of generaUty assume s-^ < S2 in a. From Figure [T] we have that s^. NEXTp = i, s^.STATUSp = 
done and S2. NEXTp = i, S2-STATUSp = do. From algorithm KK^, state S2 must be preceded in a by 

transition ( S3,checkp, S3 j, such that S3. NEXTp = i and S3. NEXTp = i, S3.STATUSP = do, where s^ 
precedes S3 in a. Finally S3 must be preceded in a by transition ( S4, doncp, S4), where s-^ precedes S4, 

such that S4. NEXTp = i and i G S4. DONEp. Since S4 precedes S3 and during the execution of KK^ 
no elements are removed from DONEp, we have that i G S3. DONEp. This is a contradiction, since the 
transition ({NEXTp = i,i e DONEp} , checkp, {NEXTp = i, STATUSp = do}) ^ trans(KK^). 

Case 2 p y^ q: Given the transition ( si, dop^j, s^ j in a, we deduce from Figure [T] that there exist in a 

transitions (s2,setNextp, S2), (s3,gatherTryp, S3 j, (s4, checkp, S4J, where S2.nerEtp = S2. NEXTp = i, 

s^.nextp = S3. NEXTp = i,S3.Qp = q, S4. NEXTp = i, S4. NEXTp = i, S4. STATUSp = do, such that 
•S2 < •S3 < S4 < Si and there exists no action vr = compNextp in a between states S2 and s^. This 
essentially means that in the execution fragment a G a starting from state S2 and ending with s^ there exists 
only a single checkp action - the one in transition ( S4, checkp, S4 ) - that leads in the performance of job 

i. Similai^ly for transition f ti, dog^i, t^ ) there exist in a transitions ( t2, setNextg, ^2 ) > ( ^3' gstherTry^, tg j , 

(t4,checkq,tA, where t2-nextq = tg-NEXTg = i, t^.nextq = t3.NEXTg = i,t^.Qq = p, t4.NEXTg = i, 

t4.NEXTq = i, t4.STATUSg = do, such that ^2 < ^3 < ^4 < ^1 and there is no action vr' = compNext„ 
occuring in a between states t2 and t^. 



In the execution a, either state S2 < ^3 or ^3 < S2 which implies t2 < S3. We will show that if S2 < ts 
then doq^i cannot take place, leading to a contradiction. The case were t2 < S3 is symmetric and will be 
omitted. So let us assume that S2 precedes ^3 in a. We have two cases, either t^.nextp = i or t^.nextp 7^ i. 
In the first case i G tg.TRY^. From Figure[T]the only action in which entries are removed from the TRYg 
set, is the compNext^ where the TRYg set is reset to 0. This means that i G t4.TRYg since $ tt' = 

compNextg G a, such that t2 < it' < ti. This is a contradiction since f ^4, cheeky, ^4 j ^ trans(KK^), if 

i e t4.TRYq, t4.NEXTg = i and t4.STATUSg = do. 

If t^.nextp ^ i, since f S2,setNextp, S2 ) G a and S2 < ^3 there exists action vri = setNextj, G a, 

such that S2 < TTi < ^3. Moreover from Figure [B there exists action tt2 = compNextp in a, such that 
S2 < 7r2 < vTi. Since $ tt = compNextp G a, such that S2 < vr < Si, it holds that Sj^ < 7r2 < vri < ^3. 

Furthermore, from Figure [U there exists transition ( S5,donep, S5 ) in a and j G {!,..., n}, such that 

ss-POSp (p) = j, s^.donepj = 0, ss.NEXTp = i, s^.donepj = i and s^ < s^ < tt2 < t^. It must be the case 
that i ^ t2.D0NEq, since t2-NEXTg = i. From that and from Figure [T] we have that there exists transition 

ft6)gatherDone^,tg j in q, such that te-Qg = P^ ^e-POSg {p) = j and t^ < Iq < ^4. Since Sg < t^ and 
donepj from Figure |T] cannot be changed again in execution a, we have that tQ.donepj = i and as a result 
i G tg.DONEg. Moreover during the execution of algorithm KK/3 entries in set DONE^ are only added 

and never removed, thus we have that i G t^.DONEq. This is a contradiction since (^4, cheeky, t4) ^ 

trans{KKp), if i G t4.D0NEg, t4.NEXTg = i and t4.STATUSg = do. This completes the proof. D 

Next we examine the effectiveness of the algorithm. 

Lemma 4.2 For any f3>m, f<m — 1 and for any finite execution a G execs (KK^) with Do{a) < 
n — (/3 + 771 — 1), there exists a (non-empty) execution fragment a' such that a ■ a' £ execs (KK^). 

Proof. From the algorithm KK^, we have that for any process p and any state s £ a, |s.FREEp| > 
n — Do{a) and |s.TRYp| < m — 1. The first inequality holds since the s.FREEp set is estimated by p 
by examining the done matrix which is stored in shared memory. From Figure [T] a job j is only inserted in 
line q of the matrix done, if a dogj action has already been performed by process q. The second inequality 
is obvious. Thus was have that \/p £ V and Vs G a, |s.FREEp \ s.TRYp| >n- {Do{a) +m- 1). If 
Do{a) < n - (/3 + m - 1), Vp G P and Vs G a we have that | s.FREEp \ s.TRYp| > /3. Since there can 
be / < m — 1 failed processes in our system, at the final state s' of a there exists at least one process p £ V 
that has not failed. This process has not terminated, since from Figure [T] a process p can only terminate if in 
the enabling state s of action compNextp, js.FREEp \ s.TRYpj < (3. This process can continue executing 
steps and thus there exits (non-empty) execution fragment a' such that a • a' G execs (KK^). D 

This means that if the KK/j algorithm has effectiveness less then or equal to n — (/3 + m — 1), there 
should be some infinite fair execution a of the algorithm with Do{a) < n — {(3 + m — 1) (since no finite 
execution of algorithm could terminate). Next we prove that the algorithm KK/3 is wait-free (the algorithm 
has no infinite fair executions) and thus there exists no such execution a G execs (KK^). 

Lemma 4.3 For any (3 > m, f < m — 1 there exists no infinite fair execution a G ea;ecs(KK^). 

Proof. We will prove this by contradiction. Let /S > m and a G execs (KK/3) an infinite fair execution with 
f < m — I failures, and let Do{a) be the jobs executed by execution a according to Definition l2. 1[ Since 
a G execs{KKfs) and from Lemma l4~T] KK/q solves the at-most-once problem, Do{a) is finite. Clearly 
there exists at least one process in a that has not crashed and does not terminate(some process need to take 
steps in a in order for it to be infinite). Since Do{a) and / are finite, there exists a state sq in a such that 
after sq no process crashes, no process terminates, no do action takes place in a and no process adds new 



entries in the done matrix in shai^ed memory. The later holds since the execution is infinite and fair, the 
Do{a) is also finite, consequently any non failed process q that has not terminated will eventually update 
the q line of the done matrix to be in agreement with the do^^* actions it has performed. Moreover any 
process q that has terminated, has already updated the q line of done matrix with the latest do action it 
performed, before it terminated, since in order to terminate it must have reached a com p Next action that has 
set its status to end. 

We define the following sets of processes and jobs according to state sq. Ja are jobs that have 
been performed in a according to Definition 12.11 Va are processes that do not crash and do not ter- 
minate in a. By the way we defined state sq only processes in Va take steps in a after state sq. 
STUCKq = {i ^ J \ Ja\^ failed process p : SQ.nextp = i}, i.e., STUCKq expresses the set of jobs that 
are held by failed processes. DONEq = {i G i7a|3p G V and j G {1, . . . , n} : so.donep{j) = i}, i.e., 
DONEq, expresses the set of jobs that have been performed before state sq and the processes that per- 
formed them managed to update the shared memory. Finally we define POOLq, = J \ {Ja U STUCKq). 
After state sq, all processes in Va will keep executing. This means that whenever such process p ^ Va takes 
action compNextp in a, the first if statement is true. Specifically it holds that for Vp G Va and for all the 
enabling states s > sq of actions compNextp in a, |FREEp \ TRYp| > /3. 

From Figure [T] we have that for any p G Va, ^ Sp after state sq in a such that V states s > 
Sp,s.DONEp = DONE„,s.FREEj, = J' \ DONE„ and s.FREEp \ s.TRYp C POOLo. Let 
Sq = maxpgp^[sp]. From the above we have: |J" \ DONEq,| > (5 > m and |POOLq| > P > m, since 
Vp G Va we have that for all the enabling states s > s'q of actions compNextp in a |FREEp \ TRYp| > /3 
and Vs' > s'q we have that s'.FREEp = J\ DONE„ and s'.FREEp \ s'.TRYp C POOL„. 

Let pq be the process with the smallest process identifier in Va- We examine 2 cases according to the 
size of J^ \ DONEc,. 

Case A |J'\DONEq| > 2m - 1: Let xq G POOLq be the job such that [a^olpooLc = 
{po - 1) • l^\DONEc.l"C'"-^) + 1. Such xo exists since Vp G Va and Vs > s'q it holds s.FREEp \ 
s.TRYp C POOLc,, s.FREEp = J\ DONE^ from which we have that |POOLa| > \J \ DONE^I - 
Is.TRYpI >\J\ DONE^I - (m - 1) > m. 

It follows that any p ^ Va that executes action compNextp after state s'q, will have its NEXTp variable 



pointing in a job x with [xJpQQ^ > {p — I) 



|^\DONEt:,|-(m-l) 



such that V states s > s'p, [s.nextp]pQQ^ 
we have to study 2 cases for po: 



> 



(p-1) 



m 
|J-\DONEo 



+ 1. Thus Vp G Va, 3 s' > s'q in a 



-(m-l) 



+ 1. Let s'n 



maxpgp^ [Sp\ 



II 



Case A.l) After s'q, process pQ executes action compNextp^j and the transition leads in state si > s'q 

|J\DONEah(m.-l) 
m 



such that si.NEXTpo = xq. Since [2;o]pooLq 



+ 1 and Po = minpgp^ [p]. 



(po - 1) 

from the previous discussion we have that Vs > si and Vp G P \ {po}, s.nextp / xq. Thus when po 
executes action checkp of Figure [T] for the first time after state si, the condition will be true, so in some 
subsequent transition po will have to execute action dopo^ajg, performing job xq, which is a contradiction, 
since after state sq no jobs are executed. 

Case A.2) After s'q, process po executes action compNextp^j and the transition leads in state si > 
s'q such that si.NEXTp^ > xq. Since po = minpg-p^[p], it holds that Vx G 



JPOOLa 



< 



(po - 1) 



|J\DONEah(m.-l) 



+ I, $p ^ V such that si.nextp 



POOLa such that 
. Let the transition 



S2,compNext , S2) G «. where S2 > si, be the first time that action compNext is executed af 



Po 



ter state si. We have that Vx G POOLq, such that [a:;]pooL — (Po — 1) 



|J\DONEa|-(m-l) 



+ 1, 



X ^ S2.D0NEpQ U S2.TRYp(,, since from the discussion above we have that Vs > si and Vp G "Pq \ {po}, 



Is. next. 



PJ POOLq 



> 



(p-1) 



\J\DONEc\-(m-l) 



+ L Thus Fo]s2.FREEp(j\s2.TRYpo — t^oJpoOLo 
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{po — 1) • — — - +1. As a result, S2.NEXTpo = xq. With similar arguments like in case 

A.l, we can see that job xq will be performed by process po, which is a contradiction, since after state sq no 
jobs are executed. 

Case B I J" \ DONEq| < 2m — 1: Let xq G POOLq, be the job such that [a^olpooLc ~ P^- Such xq 
exists since j3 > m and POOL^ > /3. It follows that any p £ Va that executes action compNextp after state 
s'q, will have its NEXTp variable pointing in a job x with [^JpooL^ — P- Thus Vp G "Pa, 3 s^ > s'q in a such 
that V states s > Sp, [s.nextp]pQQ^ > p. Let Sq = maxp^-p^ [Sp], we have to study 2 cases for pq: 

Case B.l) After Sq, process po executes action compNextp^j and the transition leads in state si > Sq 
such that si.NEXTpp = xq. Since [2;o]pooL ~ Po ^^'^ Po — ™i%GPa W' from the previous discussion we 
have that Vs > si and \/p £ V \ {po}, s.nextp / xq. Thus when pQ executes action checkp of Figure [T] 
for the first time after state si, the condition will be true, so in some subsequent transition pQ will have to 
execute action dop^^xo^ performing job xq, which is a contradiction, since after state sq no jobs are executed. 

Case B.2) After Sq, process pq executes action compNextp^ and the transition leads in state si > Sq 
such that si.NEXTpQ > xq. Since po = uimp^-p^ [p], it holds that \/x G POOLq such that [x]pooL — Po. 

$p G V such that si.nextp = x. Let the transition ( S2, compNextpg, S2 ) G a, where S2 > si, be the first 
time that action compNext ^ is executed after state si. We have that Vx G POOLq such that [xJpQQ^ < Po, 
X ^ S2.D0NEp,j U S2.TRYpp, since from the discussion above we have that Vs > si and Vp G "Pq \ {po}, 

[s.nexgpooL^ > P- Thus [xo],2,freEp,As2.trYpo = I^oIpool, = Po- As a result, s^EXTp^ = xq. 
With similar arguments like in case B. 1, we can see that job xq will be performed by process po, which is a 
contradiction, since after state sq no jobs are executed. 

D 
Using the last two lemmas we can find the effectiveness of algorithm KK^. 



Theorem 4.4 For any f3 > m, f < m — 1 algorithm KK^ has effectiveness E}:^}:^„{n,m, f) 
(/3 + jn - 2). 



n 



Proof. From Lemma 14.21 we have that any finite execution a G execs (KK^) with Do{a) < n — 
(/3 + m — 1) can be extended, essentially proving that in such an execution no process has terminated. 
Moreover from Lemma 14.31 we have that KK^ is wait free, and thus there exists no infinite fair execution 
a G execs (KK/3), such that Do{a) < n — (/3 + ttt, — 1). Since finite fair executions are executions were 
all non-failed processes have terminated, from the above we have that -EkKa {n, m, f) > n — {(3 + m — 2). 
If all processes but the process with id m fail in an execution a in such a way that ^qPiSTUCKq = and 
I STUCKq \ = m — 1 (where STUCKq is defined as in the proof of lemma 1431 ). then there exists adversarial 
strategy, that can result in /3 + m — 2 jobs not having been performed when process m terminates. Such 
an execution will be a finite fair execution where n — (/3 + 771 — 2) jobs are performed. From this and the 
previous claims we have that E'kk^ (n, m, /) = n — (/3 + m — 2). D 

5 Work Complexity Analysis 

In this section we ai^e going to prove that for (3 > Sm"^ algorithm KK^ has work complexity 
0{nm log n log m). 

The main idea of the proof, is to demonstrate that under the assumption /3 > 3m^, process collisions on a 
job cannot accrue without making progress in the algorithm. In order to prove that, we first demonstrate that 
if two different processes p, q set their NEXTp, NEXTg internal variables to the same job i in some compNext 
actions, then at the enabling states of those actions the DONEp and DONEg sets of the processes, have at 
least \q — p\m different elements, given that /3 > 3m^. Next we prove that if two processes p, q collide 
three consecutive times, while trying to perform some jobs, then the size of the set DONEp U DONE^ that 
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processes p and q know has increased by at least \q — p\m elements. This essentially tells us that every thi^ee 
collisions between the same two processes a significant number of jobs has been performed, and thus enough 
progress has been made. In order to prove the above statement, we need to formally define what we mean by 
collision, and tie such a collision with some specific state, so that we have a fixed "point" in the execution 
for which to reason. Finally we use the argument about the progress made if three consecutive collisions 
happen between two processes p, q, in order to prove that a process p cannot collide with a process q more 



than 2 



m\q—p 

with process q more than 2 



times in any execution. This is proved by contradiction, proving that if process p collides 



rn\q~p\ 



times, there exist states for which the set |DONE„ U DONE„| has 



more than n elements which is impossible. The last statement is used in order to prove the main theorem on 
the work complexity of algorithm KK^ for /5 > 3m^. We obtain the main theorem on the work complexity 
by counting the total number of collisions and the cost of each collision. 

We start by proving that if two processes p, q decide, with some compNext actions, to perform the same 
job i, then their DONE sets at the enabling states of those compNext actions, differ in at-least \q — p\m 
elements. 

Lemma 5.1 If/3 > Sm? and in an execution a E execs{K¥^p) there exist states si, ti and processes p,q ^ 
V with p < q such that si.NEXTp = ti.NEXTg = i ^ J, then there exist transitions ( S2, compNextp, Sg I, 

(t2,compNextg,t2)> where S2-NEXTp = tg-NEXTg = i, Sg-STATUSp = t2-STATUSg = set jnext , such that 

there exist no action tti = compNext with Sg < tti < si and no action tt2 = compNext„ with t2 < tt2 < ti 
and |s2.D0NEpnt^JX)NE^| > {q - p)m or |s2.D0NEp n t2-D0NEg| > {q - p)m 

Proof. We will prove this by contradiction. From algorithm KK^ there must exist transitions 
(s2, compNextp, Sg), (^2, compNext^, tj ) where S2-NEXTp = i and t2-NEXTg = i, such that there ex- 
ist no actions vri = compNextp, 7r2 = compNext„ with ^2 < ^i < -^i ^rid ^2 < ^2 < ti, if there 
exist si,ii G a and p,q £ V with p < q such that si.NEXTp = ti.NEXTg = i ^ J, since those are 
the transitions that set NEXTp and NEXT^ to i. So in order to get a contradiction we must assume that 
|s2.D0NEpnt^lX)NE^| < (g-p)-mand |s2.D0NEp n t2-DONE5| < {q - p) ■ m. 

We will prove that if this is the case Sg. NEXTp / tg-NEXTg. 

Let A = J \ S2-D0NEp = S2-FREEp and B = J \ t2-D0NEg = t2-FREEg, thus from the contra- 
diction assumption we have that: | A n B| < {q — p) ■ m and | A n B| < (g — p) • m. 

It could either be that |A| < |B| or |A| > |B|. 

Case 1 |A| < |B|: From the contradiction assumption we have that |A n BI < {q — p) ■ m. Thus we have 
that: 

|t2.FREEg \ t2.TRYg n S2.FREEp \ S2.TRYp| < m{q - p) + m - 1 (1) 

, since S2.FREEp \ S2.TRYp can have up to tti — 1 less elements than A - the elements of set S2-TRYp - 
and it can be the case that S2-TRYp n t2-TRYg = . 

Moreover, since S2.FREEp \ S2.TRYp C A and |s2.FREEp \ S2.TRYp| > /3 > Sm^, |A| > Sm^. 
Similarly |B| > 3m^. We have: 



iQ-r 



|B| 



m 



(p - 1) h (g - p) — > {p- 

m m 



■,n|A| , ^|B| 

1 — + 9-p — 

m m 



x|B| , JAI , , 

g-lV— ^ > (p-l)^—^+3m(q-p) 
m m 



(9-i; 



|B| 



[m 



1) 



m 



+ 1> 



(p-1) 



|A| - (m- 1) 



ni 



+ 1 + 'im{q — p) 



(2) 

(3) 
(4) 
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Since Sg.NEXTp 



ig-NEXTq 



i, it must be the case that W 



S2.FREEp\s2.TRYp 



(p-1) 



|A|-(m-l) 



and 



Jt2.FREEqV2.TRY, 



(Q-l) 



|B|-(m-l) 



+ 1. Equation |4] gives that [i] 



t2.FREE,\t2.TRYq 



+ 1 
> 



[^]s FREE \s TRY + 3m,(g' — p). This means that t2-FREEg \ t2-TRYg must have at least 3m{q — p) 
more elements with rank less that the rank of i, than set S2.FREEp \ S2.TRYp does. This is a contradiction 
since from[I]we have that |i2.FREEg \ i2-TRY5 n S2.FREEp \ S2-TRYp| < m{q - p) + m - I. 

Case 2 |B| < |A|: We have that | A n B| < (g — p) • m and | A n B| < (g — p) • m from the contradiction 
assumption. Thus we have that: 



|t2.FREEg \ t2-TRYq n S2.FREEp \ S2.TRYp| < m{q - p) + m - 1 



(5) 



, since S2.FREEp \ S2-TRYp can have up to tti — 1 less elements than A - the elements of set S2.TRYp - 
and it can be the case that S2-TRYp n t2-TRYp = 0. 

From the contradiction assumption and the case 2 assumption we have that |B| < | A| < |B| + {q — p)m. 
Moreover |A| > /? > 3m^ and |B| > /3 > 3m^. We have: 



(9-1) 



|B| + {q — p)m 



m 



{p 



iN|B| + (g-p)m \Q\ + {q-p)m 

1) V[.q-p) > 



ni 



m 



>{p-l)'-^ + {q 
m 



, |B| + {q-p)m , JA| , ^ , 

p) > (p - 1) h 3m (g - p) + (g 

m m 



pY 



IRI lAI 

(q — 1) — > (p — 1) h 3m (q 

m m 



■p) + [q-pf - [q- l){q-p) 



|B 
m 



|A| 



{q-l)'-^> {p-l)'-^ + {2,171 -p+l){q-p) 



m 



IB 



|A| 



(<?-!)— >(p-l) — + 2m(g-p) 



m 



m 



(q-K 



|B| 



[m 



1) 



m 



+ 1> 



(p-1) 



{m — 1) 



m 



+ 1 + 2m{q — p) 



Since Sa.NEXTr 



ig.NEXTg 



i, it must be the case that [i 



S2.FREEp\s2.TRYp 



(p-i; 



lAh(m-l) 



and 



Jt2.FREEqV2.TRY9 



(9-1) 



|B|-(m-l) 



+ 1. Equation [6] gives that [i]^ 



(6) 

+1 
> 



]s2.FREEp\s2.TRYp + 2m(g' 



lt2.FREE,V2.TRY, - 

p). This means that i2-FREEg \ t2-TRYg must have at least 2m{q — p) 
more elements with rank less that the rank of i than set S2.FREEp \ S2.TRYp. This is a contradiction since 
from[5]we have that |t2.FREEg \ ti-TV^q n sg.FREEp \ S2-TRYp| < m{q - p) + m - I. D 

Next we are going to prove that if 2 processes p,q € V with p < q "collide" three times, their DONE 
sets at the third collision will contain at least m{q — p) more jobs than they did at the first collision. This 
will allow us to find an upper bound on the collisions a process may participate in. It is possible that both 
processes become aware of a collision or only one of them does while the other one successfully completes 
the job. At the proofs that follow, for a state s in execution a we define as s.DONE the following set: 
s.DONE = {i G i7|3p € V and j G {1, . . . , n} : s.donep{j) = i}. We also need the following definitions. 

Definition 5.1 In an execution a E execs{KKp), we say that process p collided with process q in job i at 
state s, if (i) there exist in a transitions ( si, compNextp, s^ \, I ti, compNext^, t^ 1 and ( S2, checkp, Sg I, 

with Si < S2 and ti < S2, S^NEXTp = t^NEXTg = S2.NEXTp = i, s'^.STATUSp = t'-^.STATUSg = 
setjnext, S2-STATUSp = compjnext, (ii) let a' be the execution fragment that begins with state Si and 
ends with state S2, there exists no action vri = com p Next € a' and either there exists in a' transition 
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s, gatherTry , s J such that s.Qp = q, s.nextq = i, or transition I s, gatherDone , s J and j € {1, . . . , n} 
such that s.Qp = q, s.POSp (q) = j, s.doneqj = i and i ^ s.TRYp. 

According to Def. 15. II process p collided with process q in job i at state s, if process p attempted to 
preform job i, but was not able to, because it detected in state s that either process q was trying to perform 
job i or process q has already performed job i. 

Definition 5.2 In an execution a G execs^KKp), we say that processes p,q collide in job i at state s, 
if process p collided with process q or process q collided with process p in job i at state s, according to 
Definition 15.71 

Lemma 5.2 In an execution a € execs (KKfs) for any /? > m if there exist processes p,q, jobs ii,i2 S 
i7 and states si < §2 such that process p collided with process q in job ii at state si and in job 12 at 
state §2 according to Definition \5.I\ then there exist transitions ( si, compNext , s^ I, ( S2, compNext , Sg I. 



ti,compNextg,t';^ j, (t2,compNextg, 4) where s'l-NEXTp = t';^.NEXTg = ii, S2.NEXTp = t2-NEXTq - 

i2, s'j^.STATUSp = S2.STATUSp = t'^STATUSg = ^-STATUSg = setjnext and there exists no action vri - 
compNextp/or which si < tti < si, S2 < tti < §2 such that: si < S2 and ti < t2- 

Proof. From Definition 15. II we have that there exist transitions ( si, compNextp, s^^ j , ( S2, compNextp, S2 

with s'j^.NEXTp = ii, S2.NEXTp = ^2, s'^-STATUSp = S2-STATUSp = setjfiext, and there exists no action 
TTi = compNextp for which si < vri < si or S2 < t^i < S2- From the later and the fact that si < S2, 
it must be the case that si < si < S2 < S2. Furthermore from Definition 15. II we have that there exist 
transitions fti, compNextp, t'^ j, (t2, compNextp, 4) with i^NEXTg = ii, tg-NEXTg = 22, t^STATUSg = 

tg-STATUSg = setjnext, such that t-^ < si and ^2 < ^2- We can pick those transitions in a in such a way 
that there exists no other transition between t^ and si that sets NEXTg to ii and similarly there exists no 
other transition between tg and §2 that sets NEXTg to 12- We need to prove now that fi < ^2- We will prove 
this by contradiction. 

Let t2 < ti. Since t-^ < si, we have that t2 < ^i < ^i < ^i < S2 < S2- Since from Definition 
15.11 either s\.nextq = i\ or there exists j G {1, . . . ,n} such that si.doneqj = ii, it must be the case 
that S2.STATUSp = gather_done, S2-Qp = q and there exists j' G {1, . . . ,n} such that S2-donepji = 12- 

This means that there exists transition ( ^3, doneg, ^3 j and j' G {1, . . . , n} such that t^.donepj' = 12 and 

t2 < ^3 < il < ^1 < Si < S2 < S2. 

If si.STATUSp = gather Jry then from algorithm KK^ we have that si.DONE C s2.D0NEp and 
as a result 12 G S2-D0NEp, which is a contradiction since fs2, compNextp, S2) ^ trons(KK^) if 12 G 

S2.D0NEp and S2.NEXTp = 12, S2.STATUSp = setjnext. 

If si.STATUSp = gather _done then from algorithm KK^ we have that si.Qp = q and there exists 
j G {1, . . . , n} such that si.POSp {q) = j and si.doneq,j = ii. Since t2 < ^3 < ^i < % < si < S2 < S2 
it must be the case that / < j and as a result 12 G si.DONEp. Clearly si.DONEp C s2-D0NEp, , 

which is a contradiction since ( S2, compNextp, S2) ^ trans(KK^) if i2 G S2-D0NEp and S2.NEXTp = 12, 

S2.STATUSp = set_next. D 

Lemma 5.3 In an execution a G execs (KK^) /or any f3 > m if there exist processes p, q, jobs ii,i2 G J 
and states si < §2 such that process p collided with process q in job ii at state si and process q collided with 
process p in job 12 at state §2 according to Definition \5.1\ then there exist transitions ( si, compNext , Si ), 
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S2,compNextp, Sg I, I ti,compNextg, t^^ I, I t2,compNextg, tg ) where s^. NEXT p = t^.NEXTg = ii, 

S2.NEXTp = tj-NEXTq = ^2, s'^^.STATUSp = Sa-STATUSj, = t'^^.STATUSg = ^-STATUSg = setjnext 
and there exists no actions vri = compNext , 7r2 = compNext /or which si < vri < si, t2 < 1^2 < S2 such 
that: 

si < S2 and ti < t2- 

Proof. From Definition 15. II we have that there exist transitions ( si, compNextp, s^ ) , ( S2, compNext^, S2 

with s'-L-NEXTp = ii, S2-NEXTp = 12, s'l-STATUSp = ■S2.STATUSp = setjnext, and there exists no action 
vTi = compNextp for which si < vri < si- Furthermore from Definition 15.11 we have that there exist 

transitions (ti, compNextp, t'^ j, (t2, compNextp, 4) with t^NEXTg = ii, tj-NEXTg = ^2, t'^-STATUSg = 

tg-STATUSg = setjnext, and there exists no action 7r2 = compNextp for which t2 < 1^2 < 82- From the 
later and the fact that si < §2, it must be the case that ti < si < ^2 < S2- We can pick the transitions that 
are enabled by states ti and S2 in a in such a way that there exists no other transition between t^ and si that 
sets NEXTq to ii and similarly there exists no other transition between §2 and §2 that sets NEXTp to ^2- We 
need to prove now that si < S2. We will prove this by contraction. 

Let S2 < si- From algorithm KK^ there exist transitions ( S3,setNextp, Sg), ( S4, doncp, S4) and 

( t3,setNexty, tg j, where s^.nextp = 12, s^-nextp = 12, t^.nextq = ii and S2 < S3 < S4 < si, 
ti < ^3 < ^2- There are 2 cases, either S3 < ^3 or t3 < S3. 



Case 1 S3 < t.^: We have that S3 < t.^ < ^2 and (^2, compNextp, ^2 ). where t2-NEXTg = Z2 and 

t2.STATUSg = setjnext which means that 12 ^ t2-TRYg U t2-D0NEg. This is a contradiction since 
the i2-TRYq and t2-D0NEq are computed by gatherTry^ and gatherDoneg actions that are preceded by 
state S3. So either i2 G t2-TRYq or ^2 G t2-D0NEg, since a new setNextp action may take place only after 
state S4. 

Case 2 ^3 < S3: We have that t3 < S3 < si and (si, compNextp, s^ j, where s^^. NEXTp = ii and 

s^-STATUSp = setjnext which means that ii ^ si.TRYp U si.DONEp. This is a contradiction since 
the si.TRYp and si.DONEp sets are computed by gatherTryp and gatherDonep actions that are preceded 

by state ^3. There exists transition (s4, gatherTryp, S4) in a with S4.Qp = q such that there exists no 

TTi = compNextp where S4 < vri < si. If s^.nextq = ii we have a contradiction since ii S si.TRYp. If 
s^.nextq / ii there exists an action 7r2 = setNextg in a, such that ^3 < 7r2 < S4. If this 7r2 = setNextg is 
preceded by transition ( t4, done^, ^4) with t4.NEXTq = ii, we have a contradiction since ii G t4.D0NE 

and Si.DONEp is computed by gatherDonep actions that are preceded by state t^^, which results in ii € 
Si.DONEp. If there exists no such transition we have again a contradiction since state si as defined by 
Definition 15 . 1 1 could not belong in a. D 

Lemma 5.4 If [i > m and in an execution a S execs^KKjj) there exist processes p 7^ q, jobs 11,12,13 G ^7 
and states si < S2 < S3 such that process p, q collide in job ii at state si, in job ^2 at state §2 and in job 23 
at state S3 according to Definition \5.2\ then there exist states si < S3 and ti < t-s such that 



Si.DONEp U ti.DONEg C s3.D0NEp n t3.D0NEg 
|s3.D0NEp U t3.D0NEg| - |si.DONEp U ti-DONE^] > m ■ \q - p\ 
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Proof. From Definitions 15.11 15.21 we have that there exist transitions f si,compNextp, s^ j, 
S2,compNextp, s'2), (s3,compNextp, S3 J and (ti, compNextg, t'^ J, (^2, compNextg, tj), 
t3,COmpNextg,t3J, where s'^^.NEXTp = %.NEXTg = ii, S2.NEXTp = t'g.NEXTg = 12, 

S3.NEXTP = t3.NEXTg = Z3, s'-^.STATUSp = s'g.STATUSp = sJj.STATUSp = t'-^.STATUSg = ^-STATUSg = 
t3.STATUSg = set_next and si < §1, ti < si, S2 < §2, ^2 < ^2, and S3 < S3, t3 < S3. We pick from a the 

transitions ( si, compNextp, sA, (ti, compNextg, t^^ ) , in such a way that there exists no other compNextp 
, compNexty between states si, si respectively ii, si that sets NEXTp respectively NEXT^ to ii. We can 
pick in a similar manner the transitions for jobs 12, ^3. From Lemmas [s!2l 15. 3l and Definitions 15. 1 [ [5^ we 
have that si < S2 < S3 and ti < ^2 < ^3- We will first prove that: 

si.DONEp U ti.DONEq C s3.D0NEp n i3.D0NE„ 



From algorithm KK/3 we have that there exists in a transitions ( S4, setNextp, s^j, i t^, setNextg, t^ j with 

s^.nextp = i2, t^.nextq = 12 and there exist no action vri = compNextp, such that Sg < vri < s^, 
and no action 1^2 = compNextq , such that tg < ^2 < ^4- We will prove that ti < S4 and si < ^4. 
We start by proving that ti < S4. In order to get a contradiction we assume that S4 < ti. From algorithm 
KK^ we have that there exists in a transition ( t^, gatherTry^, t^ 1 , with t4.Qg = p, and there exists no action 

7r2 = compNextq, such that t4 < 7r2 < t2- We have that S4 < ti < t^ < t2 andz2 ^ t2-TRYgUt2-DONEq. 
If t^.nextp = 12 we have a contradiction since 12 G S2.TRYq. If t^.nextq 7^ 12 there exists an action vrs = 
setNextp in a, such that S4 < vrs < ^4. If this tt^ = setNextp is preceded by transition ( S5, donep, s^ 1 with 
S5. NEXTp = ^2, we have a contradiction since ^2 G t4.D0NE and t2-D0NEg is computed by gatherDone^ 
actions that are preceded by state t^, which results in 12 G t2-D0NEq. If there exists no such transition we 
have again a contradiction state S2 as defined by Definition [5]2] could not belong in a. 

From the discussion above we have that ti < S4. Thus ti.DONEg < S4.DONE, moreover S3.D0NEp 
is computed by gatherDonep actions that are preceded by state S4, from which we have that ti.DONEg C 
S3.D0NEp. It is easy to see that si.DONEp C s3.D0NEp holds, thus we have that . si.DONEp U 
ii.DONEg C s3.D0NEp. With similar- ai-guments as before, we can prove that si.DONEp Uti.DONEg C 
i3.D0NEg, which gives us that si.DONEp U ti.DONE, C s3.D0NEp n t3.D0NEq. 

Now it only remains to prove that: 

|s3.D0NEp U tg.DONEgl - |si.DONEp U ti.DONEg] > m ■ \q - p\ 



If p < q from Lemma [57T] we have that |s3.D0NEp n ts.DONEq] > {q - p)m or 
|s3.DONEpnt3.DONEq| > {q - p)m . Since si.DONEp U ti.DONEg C ss.DONEp n t3.D0NEg, 
we have that: 

|s3.DONEpUt3.DONE<y| - |si.DONEp U ti.DONEq| > {q-p)-m 
If q < p with similar arguments we have that: 

|s3.D0NEp U t3.DONE5| - |si.DONEp U ti.DONE,] > {p - q) ■ m 
Combining the above we have: 

|s3.DONEpUt3.DONEg| - |si.DONEp U ti.DONE<^| > m ■ \q - p\ 



D 
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Lemma 5,5 If (3 > 3m? there exists no execution a G execs (KK^) at which process p collided with 



process q in more than 2 



m\q~p\ 



States according to Definition I5.il 



Proof. Let execution a G execs (KK/3) be an execution at which process p collided with process q 



in at least than 2 



m\q-p\ 



+ 1 States. Let us examine the first 2 



m\q—p\ 



+ 1 such states. Let those 



states be si < S2 < . . . < s„ 



Si < S2 < 



< s„ 



m\q — p\ 
I m\q- 



vri\q — p 

< s 



< s„ 



From Lemma 15.21 we have that there exists states 



r „ 1 that enable the compNextp actions and states ti < ^2 

I ™l<3-p| I 



< 



...<tr „ -|<tr „ -| that enable the compNextq actions that lead to the collisions in states 

I m\q~p\ I I „. I- -I 1 + 

Si < S2 < . . . < S 



i- 



< s^r „ 1 ■ Then from Lemma [54] we have that Vi G <!,..., 



I rn.\q~p\ | 



m\q-p\ 



|s2i+i.D0NEp U t2i+i-D0NEg| - |s2i-i.D0NEp U t2i-i-D0NEg| > m\q - p\ 

|s2j+i.D0NEp U i2i+i-D0NEg| - |si.DONEp U ii.DONEg| > im\q - p\ 
|s2i+i.D0NEp U t2i+i-D0NEq| > im\q - p\ 
From |9] we have that: 



•■h 



'[9-p| 



+ 1 



. .DONEp U t^ 



I rn.\q-p\ | 



+ 1 



.DONE, 



> m\q — p\ 



n 



m\q — p\ 



> n 



}■■ 

(V) 

(8) 
(9) 



(10) 



Equation [TO]leads to a contradiction since s^ 

D 



r n 1 
I "i\q-p\ I 



.DONE„Ut 



I rn\q-p\ | 



+ 1 



.DONEg <ZJand\J\ = n. 



Theorem 5.6 If P > Sm'^ algorithm KK/j has work complexity Wkkh = O (nm log n log in). 

Proof. We start with the observation that in any execution a of algorithm KK^, if there exists process p, job 
i, transition ( si, donep, s^ j and j G {1, . . . , n} such that si.POSp (p) = j, si.NEXTp = i, for any process 

q ^ p there exists at most one transition ( ti, gatherDone„, t^ j in a, with ti.Qg = p, ti.POS^ (p) = j and 
ti > si. Such transition performs exactly one read operation from the shared memory, one insertion at the 
set DONEg and one removal from the set FREEg, thus such a transition costs 0(log n) work. Clearly there 
exist at most m — 1 such transitions for each donep. From Lemma |4~T] for all process there can be at most 
n actions donep in any execution a of algorithm KK^. Each donep action performs one write operation 
in shared memory, one insertion at the set DONEg and one removal from the set FREEg, thus such an 
action has cost O(logn) work. Furthermore any donep is preceded by m — 1 gatherTryp read actions 
that read the next array and each add at most one element to the set TRYp with cost 0(log n) and m — I 
gatherDonep read actions that do not add elements in the DONEp set. Note that we have already counted the 
gatherDonep read actions that result in adding jobs at the DONEp set. Finally any donep action is preceded 
by one compNext action. This action is dominated by the cost of ronfc(FREEp, TRYp, i) function that 
has cost 0{m log n), if the sets FREEp, TRYp are represented with some efficient tree structure that allows 
insertion, deletion and search of an element in 0(log n). We discussed at Section [3] what such tree structures 
could be. That gives us a total of bound of 0{nmlogn) work associated with the donep actions. 

If a process p collided with a process q in job i at state s, we have extra an extra compNextp action, 
m — 1 extra gatherTry^ read actions and insertions in the TRYp set and m — 1 gatherDonep read actions 
that do not add elements in the DONEp set. Thus each collision costs O(mlogn) work. Since /3 > 3m^ 
from Lemma 15.51 for two distinct processes p, q we have that in any execution a of algorithm KK^ there 
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exist less than 2 
we get: 



m\q-p\ 



q&V~{p} 



n 



m\q — p\ 



collisions. For process p if we count all such collisions with any other process q 

2n v-^ 1 , , 4n v-^ 1 , , 4n , 

<2(m-l)H > -<2(m-l)H > -<2(m-l)H logm 

m ^-^ \q — m m ^-^ i m 

q&r-{p} '^ ^' *=i 

(11) 

If we count the total number of collisions for all the m processes we get that if /? > 3m? in any execution 

of algorithm KK^ there can be at most 2m? + An log m < 4(n + 1) log m collisions (since n > /3). Thus 

collisions cost 0(nm log n log m) work. Finally any process p that fails may add in the work complexity less 

than 0(m log n) work from its compNextp action and from reads (if the process fails without performing a 

donep action after its latest compNextp action). So for the work complexity of algorithm KK^ if /3 > 3m^ 

we have that Wkk^ = O {nm, log n log m) . D 

6 An Asymptotically Work Optimal Algorithm 

Here we demonstrate how to use algorithm KK^ with /? = 3r?i^ if m = O(v^), in order to solve the 
at-most-once problem with effectiveness n — 0(m^ log n log m) and work complexity 0(n + m^^'^'^^ log n), 
for any constant e > 0, such that 1/e is a positive integer. We construct algorithm IterativeKK (e) Fig.|2l 
that performs iterative calls to a variation of KK^, which we call IterStepKK. IterativeKK (e) has 3 + 1/e 
distinct done matrices in shared memory, with different granularities. One done matrix, stores the regular- 
jobs performed, while the remaining 2 + 1/e matrices store super-jobs. Super-jobs are groups of consecutive 
jobs. From them, one stores super-jobs of size mlognlogr?!, while the remianing 1 + 1/e matrices, store 
super-jobs of size tti^"**^ log n log^"^* ?tt, for i G {1, . . . , 1/e}. 

IterativeKK (e) for process p: 

00 sizCp.i <— 1 

01 sizCpj2 •<— "I log n log m 
■ map (J7,sizep_i, sizep_2) 

IterStepKK (FREEp, sizep,2) 

04 for(i -i- 1, i < 1/e, i + +) 

05 sizOp,! <— sizepj2 

06 sizep_2 •<— "i"'^""^ lognlog"*^^* m 

07 FREEp <— map (FREEp, sizep^i, sizep_2) 

08 FREEp <- IterStepKK (FREEp, sizep,2) 

09 endfor 

10 sizBpi <r- sizep^2 

11 sizep_2 <r- 1 
map (FREEp, sizep^i, sizep_2) 
IterStepKK (FREEp,sizep,2) 

Figure 2: Algorithm IterativeKK (e): pseudocode 

The algorithm IterStepKK is different from KK^ in three ways. First, all instances of IterStepKK 
work for /3 = 3m?. Moreover IterStepKK has a termination flag in shared memory. This termination 
flag is initially and is set to 1 by any process that decides to terminate. Any process that discovers that 
|FREEp \ TRYpI < 3m'^ in its compNextp action, sets the tennination flag to 1, computes new FREEp and 
TRYp set, returns the set FREEp \TRYp and terminates the current iteration. Any process p that checks if it 
is safe to perform a job, checks the termination flag first and if the flag is 1, the process instead of performing 
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the job, computes new FREEp and TRYp set, returns the set FREEp \ TRYp and terminates the current 
iteration. Finally, IterStepKK takes as inputs the variable size and a set SETi, such that |SETi| > 3m^, 
and returns the set SET2 as output. SETi contains super-jobs of size size. In IterStepKK, with an action 
dop j process p performs all the jobs of super-job j. IterStepKK performs as many super-jobs as it can and 
returns in SET2 the super-jobs, which it can verify that no process will perform upon the termination of the 
algorithm IterStepKK. In IterativeKK (e) we use also the function SET2 = map (SETi, sizei,size2), 
that takes the set of super-jobs SETi, with super-jobs of size sizei and maps it to a set of super-jobs SET2 
with size size2- 

Theorem 6.1 Algorithm IterativeKK (e) has work complexity WitcrativeKK(e) = 0(n + m^"'"'^logn) and 
and effectiveness ^itcrativcKK(e) {n, m, f) = n - 0{m^ log n log m). 

Proof. In order to determine the effectiveness and work complexity of algorithm IterativeKK (e), we 
compute the jobs preformed by and the work spend in each invocation of IterStepKK. Moreover we 
compute the work that the invocations to the map () function add. The first invocation to function map () in 
line 02 can be completed by process p with work 0{:^nio~nTo~m ^°S ''^)^ since process p needs to construct 
a ti"ee with ^^^ ^^^ ^ elements. This contributes for all processes 0( j^"^ ) work. From Theorem 15.61 we 
have that IterStepKK in 03 has total work 0(n + ^ ipg ". log m "^ ^°S ^ log ^^) = 0(n), where the first n 
comes from do actions and the second term from the work complexity of Theorem 15.61 Note that we count 
0(1) work for each normal job executed by a do action on a super-job. That means that in the invocation 
of IterStepKK in line 03, do actions cost m log n log m work. Moreover from Theorem 14.41 we have 
effectiveness ^ log n log m ~ (3?7i^ + m — 2) on the super-jobs of size m log n log m. From the super-jobs not 
completed, up to m — 1 may be contained in the TRYp sets upon termination in line 03. Since those super- 
jobs are not added (and thus are ignored) in the output FREEp set in line 03, up to {m — l)77ilognlog?7i 
jobs may not be performed by IterativeKK (e). The set FREEp returned by algorithm IterStepKK in line 
03 has no more than 3m? + m — 2 super-jobs of size m log n log m. 

In each repetition of the loop in lines 04 — 09, the map () function in line 07 constructs a FREEp 
set with at most 0(?Ti^+'^/log?Ti) elements, which costs 0(m^+'^) per process p for a total of 0(m 



3+€^ 



work for all processes. Moreover each invocation of IterStepKK in line 08 costs from Theorem 
0(3?Ti'^lognlog7Ti + 771'^+'^ log 77T.) < 0(m^+'^logn) work, where the term 3?7i'^ log n log m is an upper 
bound on the work needed for the do actions on the super-jobs. From Theorem l4.4l we have that each output 
FREEp set in line 08 has at most 3m^ + m — 2 super-jobs. Moreover from each invocation of IterStepKK 
in line 08 at most m — 1 super-jobs are lost in TRY sets. Those account for less than {m — l)m log n log m 
jobs in each iteration, since the size of the super-jobs in the iterations of the loop in lines 04 — 09 is strictly 
less than m log n log m. 

When we leave the loop in lines 04 — 09, we have a FREEp set with at most 3m^ + m — 2 super- 
jobs of size log n log^^^' "^ m, which means that in line 12 function map () will return a set FREEp with 
less than (3?n^ + m — 2)(lognlog^^^''^ ni) elements that con^espond to jobs and not super-jobs. This 
costs for all processes a total of 0{7n^ log m log log n log log ttt,) < 0(m'^''"^ logn), since e is a constant. 
Finally we have that IterStepKK in line 13 has from Theorem l5.6l work 0(m^ log^ m log log n log log m) < 
0(771^"'"'^ log n), also from Theorem l4.4l it has effectiveness {3m^+m—2){lognlog^~^^''^ m) — {3m'^+m—2) 

If we add up all the work we have that IFitd-ativcKKfe) = ^{n + m^+'^logn) since the loop in lines 
04 — 09 repeats 1 + 1/e times and e is a constant. Moreover for the effectiveness, we have that less that 
or equal to (m — l)m log ra log m, jobs will be lost in the TRY set at line 03. After that strictly less than 
{m— l)m log n log m jobs will be lost in the TRY sets of the iterations of the loop in lines 04 — 09 and less 
than 3771^ + 777 — 2 jobs will be lost from the effectiveness of the last invocation of IterStepKK in line 13. 
Thus we have that -EitGrativcKK(e) ("-; "i) /) = ^ — 0(771^ log n log m). D 

For any 777 = 0( ^^y/n/logn), algorithm IterativeKK (e) is work optimal and asymptotically effec- 
tiveness optimal. 
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6.1 An Asymptotically Optimal Work Complexity Algorithm for the Write- All Problem 

WAJterativeKK (e) for process p: 

00 sizcp.i <— 1 

01 sizCp^2 <— "I log f^ log "i 

02 FREEp •<— map (J7,sizcpji, sizCp,2) 

03 FREEp ^ WAJterStepKK (FREEp, sizep,2) 

04 for{i <- 1, i < 1/e, i + +) 

05 sizCp^i <— sizep_2 

06 sizCp,2 <— "1"'^""^ lognlog'^^* ni 

07 FREEp <— map (FREEp, sizCp^i,sizCp_2) 

08 FREEp <- WAJterStepKK (FREEp , sizep , 2 ) 

09 endfor 

10 sizep^i <— sizep_2 

11 sizep_2 <— 1 

12 FREEp •«— map (FREEp, sizep_i,sizep^2) 

13 FREEp <- WAJterStepKK (FREEp, sizep,2) 

14 for(j e FREEp) 

15 dop^i 

16 endfor 

Figure 3: Algorithm WAJterativeKK (e): pseudocode 

Based on IterativeKK (e) we construct algoritlim WAJterativeKK (e) Fig.[3l that solves the Write-All 
problem |[T4l with work complexity 0(n + m^^'^'^' log n), for any constant e > 0, such that 1/e is a positive 
integer. From Kanellakis and Shvartsman f\M the Write-All problem for the shared memory model, consists 
of: "Using m processors write I's to all locations of an array of size n." Algorithm WAJterativeKK (e) 
is different from IterativeKK (e) in two ways. It uses a modified version of IterStepKK, that instead 
of returning the FREEp \ TRYp set upon termination returns the set FREEp instead. Let us name this 
modified version WAJterStepKK. Moreover in WAJterativeKK (e) after Une 13 process p, instead of 
terminating, executes all tasks in the set FREEp. Note that since we are interested in the Write-All problem, 
when process p performs a job i with action dop^i, process p just writes 1, in the i—th position of the Write 
All array wa[l, . . . , n] in shared memory. 

Theorem 6.2 Algorithm WA_IterativeKK (e) solves the Write-All problem with work complexity 

WVAJtcrativcKK(e) = 0(n + 771^+^ log n). 

Proof, (of Theorem 16.21) We prove this with similar- arguments as in the proof of Theorem 16.11 As in 
the proof of Theorem 16.11 after each invocation of WAJterStepKK the output set FREEp has less than 
3m? + m — 1 super-jobs, from Theorem 14.41 The difference is that now we don't leave jobs in the TRYp 
sets, since we are not interested in maintaining the at-most-once property between successive invocations 
of the WAJterStepKK algorithm. Since after each invocation of WAJterStepKK the output set FREEp 
has the same upper bound on super-jobs as in IterativeKK (e), with similar arguments as in the proof of 
Theorem 16.11 we have that at line 13 the total work performed by all processes is 0(n + ?tt,'^+'^ logn). 
Moreover from Theorem 14.41 the output FREEp set in line p has less 3m^ -\- m — 2 jobs. This gives us for 
all processes a total work of 0(m'^+^) for lines the loop in lines 14 — 16. After the loop in lines 14 — 16 all 
jobs have been performed, since we left no TRY sets behind, thus algorithm WA_IterativeKK (e) solves 
the Write-All problem with work complexity W\vAJterativeKK(e) = 0(n + m?^'^ log n). D 

For any ?ti = 0( ^"\/n/logn), algorithm WAJterativeKK (e) is work optimal. 
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